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An alternative to commonly used domain wall fermions is presented. Some rigorous bounds on the condition 
number of the associated linear problem are derived. On the basis of these bounds and some experimentation it 
is argued that domain wall fermions will in general be associated with a condition number that is of the same 
order of magnitude as the product of the condition number of the linear problem in the physical dimensions by the 
inverse bare quark mass. Thus, the computational cost of implementing true domain wall fermions using a single 
conjugate gradient algorithm is of the same order of magnitude as that of implementing the overlap Dirac operator 
directly using two nested conjugate gradient algorithms. At a cost of about a factor of two in operation count it 
is possible to make the memory usage of direct implementations of the overlap Dirac operator independent of the 
accuracy of the approximation to the sign function and of the same order as that of standard Wilson fermions. 
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1. INTRODUCTION 

This review will start with a sketch of the 
kinematical-algebraic aspects of the overlap Dirac 
operator in the vector-like context. Next comes a 
general discussion of numerical implementations 
of the overlap Dirac operator. Section 2 is de- 
voted to an alternative domain wall model. This 
model is domain-wall like in the sense that an 
extra dimension is added and the computation 
of the light fermion propagator requires a single 
conjugate gradient procedure, albeit for a matrix 
representing fermions in five dimensions. On the 
other hand, the model is designed so that its out- 
put in exact arithmetic is the same as that of an 
iterative method implementing the overlap Dirac 
operator directly. The latter method requires a 
two-level nested conjugate gradient procedure. In 
section 3 rigorous results on spectral properties of 
our model are presented and in section 4 these re- 
sults are compared to true domain wall fermions. 
The main conclusion is that nested procedures 
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typically are more efficient than implementations 
based on domain-walls. This counter-intuitive 
conclusion is explained by the condition number 
of the domain wall problem being the product 
of the condition number of the four dimensional 
problem by the inverse bare quark mass. The lat- 
ter two factors govern individually the two nested 
cycles in direct implementations. In Section 5 it 
is shown that one can eliminate the requirement 
of linearly growing memory consumption for in- 
creasing accuracy at the cost of a factor of two in 
operations. In practice, the factor of two is often 
not felt because the implementation is memory 
bound. Section 6 contains some final comments. 

A large part of this talk is about work done in 
collaboration with Rajamani Narayanan |^]. 

1.1. Algebraic structure 

The overlap formulation |^ of vector- like gauge 
theories on the lattice preserves chiral symmetries 
exactly, a property thought to be unattainable for 
many years. If one adds to the Ginsparg Wilson 
relation, as originally formulated in 1982 a 
requirement of 75-hermiticity, the combination is 
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equivalent to the overlap at the algebraic level. 
The set-up has been of interest to mathematicians 
much earlier as a generalization of the concept of 
angle between two straight intersecting lines in a 
plane. The plane is generalized to a vector space 
over the complex numbers and the two lines to 
two subspaces of the vector space |^. In our ap- 
plication the dimension of the vector space will 
always be even, but the subspaces can have un- 
equal dimensions - when they do, the angle con- 
cept looses its meaning. The Kato setup is also 
meaningful for infinite dimensional Hilbert spaces 
and has other applications to physics 

The subspaces are defined by projectors onto 
them, in our case the projectors are hermitian 
and replaced by linearly related reflections, e and 
e'. = 1 and e' ^ = 1. The —1 eigenspaces 
of e and e' are the subspaces in question. They 
are spanned by orthonormal sets denoted by {wfc} 
and {v'l.,} with evk + vu = e'wj., + v'^., = 0. The 
information about the relative positioning of the 
subspaces is contained in the overlap matrix M: 



(1) 



The coarsest measure of relative orientation is ob- 
viously I detM|. The main identity |^ is 



detA^I^ = detDo, Do = 



1 



(2) 



In our case e' — 75 and e = sign(_ffn/), where 
Hw = 75 a-nd D^/ is any lattice version of the 
Dirac operator describing fermions with negative 
mass of order i where a is the lattice spacing, e' 
can be thought of as describing Dirac fermions of 
positive infinite mass. Dy^ is restricted by the re- 
quirement that Hy^ be hermitian. The simplest 
choice, which minimizes operations in the compu- 
tation of the action of Dw on a vector, is to pick 
Dw as the Wilson lattice Dirac operator. It is 
possible that when all practical aspects of a sim- 
ulation are taken into account a seemingly more 
complicated choice would pay better off. 

Hw depends on the background gauge field 
represented by a collection of unitary link ma- 
trices \Jp_. Hw has the following properties: 

• Hw is too large to be stored in the com- 
puter memory in its entirety, but it is 



sparse, so its action on vectors can be im- 
plemented. 

• The spectrum of Hw is bounded by a finite 
bound that docs not depend on the gauge 
background. 

• If all products of gauge matrices on links 
around plaquettes. Up, obey ||t/p — 1|| < 77, 
where 77 is a small positive number indepen- 
dent of the gauge background, the spectrum 
of H^ is also bounded by a finite number, 
independently of the gauge- field. 

1.2. Basic numerical issues 

The calculation of the action of e on a vector 
must use sparse matrix techniques. The bound- 
edness of the spectrum means that the sign func- 
tion needs to be approximated accurately only 
in two finite segments symmetrical about zero, 
which contain the entire spectrum of Hw, for any 
gauge background allowed by the pure gauge ac- 
tion. The main strain on the approximation oc- 
curs around zero, where the gap turns out in prac- 
tice to be very small. This is where essentially all 
the numerical cost goes. 

There are two main approaches to the approx- 
imate implementation of the sign function: One 
is the direct approach (overlap), and the other is 
indirect (domain wall fermions). 

In the direct approach one looks for a rational 
approximation for the sign function in the range 
defined by the bounds on the spectrum of Hw- 
The rational approximation is written as a sum 
of pole terms. The crucial point is that the ac- 
tion of each pole term on a vector need not be 
calculated separately: rather, the action of all 
terms can be calculated simultaneously, in one 
single pass through the conjugate gradient algo- 
rithm I ^ . The approximation gets more accurate 
when the number of pole terms is increased. Set- 
tling for a certain number of terms, n, one obtains 
an approximation for e. 



(3) 



The "masses" have to be non-negative, but 
the weights Wg can take either sign. In practice 
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we need the action of ^j- on a vector, so the action 
of e" is needed many times for one evaluation. 
We end up with two levels of nested conjugate 
gradient algorithms. 

In the context of domain wall fermions an ap- 
proximation for the sign function is not con- 
structed directly. Rather, one invents a larger 
problem, defined by a matrix H. H is determined 
by Hw but its dimensions are n-times larger. H 
is still sparse. One then arranges that in a given 
subspace of the larger space H acts on one has 



specific subspace 



Ho^j.Do (4) 



Above, Do and Ho can differ from eq. (2) by 
terms that disappear in the continuum limit. The 
exact form is a matter of convenience. One ends 
up needing to do only one inversion, using a single 
conjugate gradient algorithm (CG). In the most 
common applications of domain wall fermions one 
uses a construction of H due to Kaplan Q . His- 
torically, Kaplan's formulation came first, and 
was a prime motivator for subsequent develop- 
ments. 

1.3. Motivation for model 

The main difference between the two methods 
is that one has a nested CG in one and a single CG 
in the other, but employing a larger matrix. The 
objective of my work with Narayanan |^ was to 
invent a model containing a version of H which is 
close to Kaplan's domain wall fermions, but also 
to the rational approximation so that a compar- 
ison of numerical costs may be carried out using 
rigorous methods. We wanted the output of ei- 
ther method to be the same in exact arithmetic, 
so it would be only the ways of doing the calcu- 
lation that differ. One way is similar to domain 
wall fermions and the other is direct. 

2. THE ALTERNATIVE MODEL 

Let -0 be the Dirac field describing a light 
quark. We wish to end up with an effective action 
for ip given by: 



Sesilp) = -V' 



1 



1 



-75 



^. (5) 



To be specific, we choose 
1 



1 



1 c^^^w + 
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with 



sin0„ 



i) 

S — 1, 2, 71. 
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This rational approximation can be easily re- 
placed by others. 

We now add 2n extra fields, XsiXs and (l)s,4's, 
pack them together with the light field into a com- 
bined field, ^ = (t/J, (^1, Xn, (f'n 

). The total 

action is 



Introduce a„ = ^/ ^ and b = The 
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trix H is given by: 
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Our goal is attained by the model because of the 
identity: 



n deticlH^ + si) 



d^#e V =="»'+H^y (9) 



The prefactor can be canceled by adding pseud- 
ofermions, which will be decoupled in the s index. 

Let us now roughly estimate computational 
costs. In the direct approach the number of inner 
iterations is approximately given by the condition 
number of Hyy, ^iHw) (which is the square root 
of the condition number of H"^). The number of 
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outer iteration goes as the condition number of 
Do which is roughly ^. Thus, the total number 

of H\Y'4' actions is roughly given by niihll^ 

In the domain wall version the number of H'^ 
operations is governed by k{H). Every H'i' op- 
eration counts roughly as 2n Hw operations. We 
need to estimate k{H) in terms of n{Hw) and 
/i. To find n{H) we need the maximal and mini- 
mal eigenvalues of . The basic trick for finding 
Amax(-ff^) and Amin(-ff^) IS the derivation of an 
exact formula for det(iJ — z). 

det(F -z) = 

n 

]J det [{clHw - z){Hw + z)+ s^] x 

s=l 

det 
Here 



l + li 1 

^:^75 + 2 H 



fn{Hw,z) 



(10) 



fn{Hw, z) = — 

n 



1 ^ 



1 



7[ clHw 



(11) 



Hw+z 

Eigenvalues of H are roots of the equation 
det(_ff — z) = 0. All the roots come from roots of 
the last factor. (Roots of the factors in the prod- 
uct over s are canceled by poles in the last factor. 
So, the spectrum of H is determined by the last 
factor.) We write 

U{Hw, z) = Sn{Hw - z, ^ \ ^ - z). (12) 



H\Y + z 



where, 



Sn{a,b) 



n ^ aci + bs^. ' 

s=l " ^ 



(13) 



The point is that one has explicit formulae for S, 
so long a and b are real. If a6 > we have 

, sign(fo) , , . 

Sn\a, b) = — tanh(na;) 
Vab 

LO = log (|1 + v^oAI/ll - ^/a/b\) ■ 
If ab < 0, 

sign(&) 

bn{ci, b) = — , tan(na;), 
V—ab 



(14) 
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={\^i^r^h)i{\-i^r^h). (15) 

These formulae make the n-dependence explicit. 



3. RIGOROUS RESULTS 

Let me describe the main idea for deriving 
bounds on the eigenvalues of H. We are looking 
for zero modes of: 



-75 + Z-\ —fn{Hw, Z) 



2 

_ 1 + M 



15+X\z) 



(16) 



Conditions on z, depending on and spectral 
properties of Hw, limit the eigenvalues of X'{z) 
to some range. Suppose 5*0 is a zero mode of 
X{z). It can exist only if \^-^\ is in the range 
of eigenvalue of {X^z))^. But, fn{Hw,z) is 
bounded from above and below for ^; = ±oo and 
for z = 0. So, it is possible to exclude vicinities 
of z = and |z| = oo. 

In this way we obtain rigorous bounds: 



and 



^{Hw)Pn{Hw, m) 



(17) 



(18) 



Pn is a function of the spectrum of Hw and of 
fi. It is well defined and calculable but clumsy 
to write down. For ranges of practical interest 

Pn ~ 1- 

This leads us to: 



k{H) < 



k{Hw) 



1 + 



2n pn{Hw,lj) 



(19) 



where the last factor is close to unity in practice. 

The meaning of the result is quite obvious: We 
need to overcome both the lattice artifact of hav- 
ing at times almost zero modes for Hw and the 
physical effect of small quark mass (/i, when close 
to zero, gives the bare lattice quark mass). 

Clearly our conclusion speaks in favor of the 
direct approach, if we assume the bounds to be 
typically saturated (the action of H is 2n more 
expensive than that of Hw)- But, before jumping 
to this conclusion we should try to decide how 
good our bounds arc. It turns out they are quite 
good. The bound on the Amax(-ff^) is typically 
saturated. As far as Xmia{H^) goes we can prove 

Ainin(^f^) < 
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M 1 
—75 + - 



-s„{Hi 



W) 



(20) 



One cannot do better than Ai„in(-f^^) either. For 
H small enough and n large enough 



TT 

2n 
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Thus either a very small eigenvalue of or 
a very small mass are guaranteed to make the 
smallest eigenvalue of very small too. The 
lower boimd in cq. (18), if optimal, indicates 
that it is possible for the smallest eigenvalue of 
H"^ to be as small as the product of these two 
small numbers. 

4. COMPARISON TO DOMAIN WALL 
FERMIONS 

For true domain wall fermions with action D 
we can prove only one kind of bound: 

2 



min 



1 



/i 



n 

+ (l + /z)^(--4 



An approximate analysis yields 



(22) 



(23) 



Upper bounds on Amax(-D^-D) are of order 10 in 
practice. 

We conclude that the matrices describing the 
version of domain wall fermions used in large scale 
simulations have conditions numbers that behave 
similarly to the condition number of our model. 

In a modest numerical experiment on a two- 
dimensional U{1) model we compared domain 
wall fermions using our alternative model, true 
domain wall fermions and the direct overlap ap- 
proach. The pure gauge action was of the single 
plaquette type, with a lattice coupling /3 = 4 and 
the lattice size was 8x8. 

We performed the calculations needed to ob- 
tain the condensate (V'phys'0phys)- We required 
the norm of the residual to go down to 10~^ and 
counted operations of Hw on a vector. The value 
of n used was 20. We did not use preconditioning 
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Figure 1 . A comparison of the number of opera- 
tions of Hiv for the inversion of the fcrmionic op- 
erator in three cases: true domain wall fermions, 
the direct implementation of the rational approx- 
imated sign function and our alternative domain 
wall fermions. 



in any of the three methods. Results were ob- 
tained using 20 gauge field configurations. Figure 
1 shows the number of if^iA-on-vector operations 
as a function of quark mass. 

The comparison between the direct overlap and 
the alternative method is simple because, by de- 
sign, the numbers one would get in exact arith- 
metic are the same in either method. Comparing 
to true domain wall fermions is more difficult be- 
cause one needs to match parameters to get sim- 
ilar physics, and this is ambiguous. For example 
we set the Wilson mass parameter in H\y to —1.5 
in all three cases and used the same mass param- 
eter n although this certainly is not correct for 
very small fx. 

We found that the best was to use the di- 
rect overlap approach, but this experiment was 
very limited and one should not immediately 
draw conclusions about numerical QCD. How- 
ever, there is enough evidence by now to scru- 
tinize seriously the question whether it is worth 
investing large scale computational resources into 
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true domain wall fermions given that there is a 
direct overlap alternative. In particular we have 
employed no preconditioning and ignored many 
other options of optimization. 

Theoretically, one is more comfortable with 
overlap fermions, since they differ from other four 
dimensional fermions just by a more complicated 
action. If the domain wall approach were more 
effective computationally one could have the best 
of the two worlds by using our alternative do- 
main wall fermions. Whichever the case may be, 
it is hard to argue in favor of true domain wall 
fermions. 

5. DOUBLE PASS ALGORITHM 

Until now we concerned ourselves only with 
counting operations. But, as is well known, this 
is only one aspect of a computation. Memory re- 
quirements and access patterns matter as much, 
or more, depending on architecture. Until now all 
methods required a factor of n times more mem- 
ory than on ordinary Wilson fermion code would. 

In the direct approach there is an option to 
trade a factor of order 2 in operations for reduc- 
ing the memory needed to that of ordinary Wilson 
fermions. Basically, this is possible because the 
heart of the code is a conjugate gradient proce- 
dure, and only a certain linear combination of the 
n vectors that are acted on is needed. 

The conjugate gradient procedure is closely re- 
lated to the Lanczos scheme for bringing a hermi- 
tian matrix to tri-diagonal form. It is well known 
that in the Lanczos scheme, if one wishes ad- 
ditional information (for example one needs an 
eigenvector) one can avoid large memory con- 
sumption by going through the algorithm twice, 
once to collect the coefficients and a second time 
to accumulate the needed data. The same works 
here 0. 

In the multishift scheme one iterates over 
an index i updating n vectors xf, starting from 
an input vector b. At each iteration these vectors 
are determined by a few s-dependent scalars and 
by three s-independent vectors that make up the 
core conjugate algorithm, w,r,p. Among them, r 
is the residual. When the iteration is stopped we 
have n vectors and they are summed into y. 



our approximation to the vector eb 

n 

y = Y,w'x' (24) 

s=l 

Actually, the iteration must be effectively stopped 
after a different number of steps for each s, since 
for higher values of s the "mass" is larger and, as 
a result, the convergence much faster. Clearly, y 
is made out of the basic Krylov vectors generated 
in the core conjugate gradient; the single reason 
we need to store n vectors is that the components 
of y in the Krylov basis at step i in the iteration 
are not yet known because they depend also on 
steps i + l,i + 2,.. .. 

The idea is now to use a first pass to calcu- 
late the needed conjugate gradient scalars which 
are i dependent but not s-dependent. Using 
these scalars we can now compute in an iteration 
storing scalars only the extra s and i-dependent 
scalars needed for implementing the multishift. 
It is possible now to also compute a set of s- 
independent scalars Ri which have the property 
that 

yi+i = ^ RkTk (25) 

fc=0 

becomes at the last iteration the desired approx- 
imation for y. But, we need the vectors again 
so we need to run the basic conjugate gradient 
again. Hence, a second pass is required, but one 
needs to store only four large vectors for any n. 

A more detailed description of the algorithm 
can be found in 0|. The surprise was that with 
a code written in a higher level language the two 
pass version can actually run faster than the sin- 
gle pass version, by about 30 percent. In the two 
pass version both the operations count and the 
memory usage are independent on n ! The speed 
up is certainly a surprise and must be strongly 
machine dependent. But, it turns out, that one 
gets the same amount of speed-up on an SGI 
O2000 as on a Pentium III PC §. 

6. FINAL COMMENTS 

Let us summarize roughly the situation we were 
looking at when we began to work on reference : 
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Although an approach based on the overlap Dirac 
operator looked theoretically cleaner, true do- 
main wall fermions were more attractive numer- 
ically. Our analysis has led us to the conclusion 
that there is no evidence that true domain wall 
fermions have even a numerical advantage. 

In all cases we looked at, one faces a problem 
related to almost zero modes of Hw- This re- 
quires large numbers of extra fields in order to 
preserve chirality. It also affects adversely the 
condition numbers. Whichever method we use, 
the worst case condition numbers are a product 
of the inverses of two main scale ratios: The first 
is the scale of the small eigenvalues of divided 
by an upper bound of the order of 5-10 in lattice 
units. The second scale ratio is the lattice phys- 
ical quark mass squared divided by a number of 
order unity. Each small scale ratio slows down in- 
version independently and the effect compounds 
in the worst case. 

Thus, as far as we can see, at the numerical 
level, there are no a priori advantages to choosing 
true domain wall fermions over overlap fermions 
in the context of QCD. In both formulations one 
faces similar numerical obstacles, and the overlap, 
to say the least, does not fare any worse than do- 
main wall fermions. At the analytical level we are 
convinced that an approach based on the overlap 
(or any other efficient replacement of the over- 
lap Dirac operator that might be found in the 
future) is superior at presently attainable gauge 
couplings in numerical QCD. Perturbation theory 
is more transparent to interpret and technically 
less complex in the overlap version. The chiral- 
ity violating effects associated with the number of 
extra fields are much more explicit and therefore 
their impact should be easier to trace through. 

Algorithmically we have been looking only un- 
der the lamp post and we are far from having 
exhausted the options there. Nevertheless, I feel 
that true domain wall fermions are an inefficient 
way to incorporate the new ideas about lattice 
chirality into practical QCD simulations. For 
other recent work on similar issues the reader is 
referred to ||] . 

I would like to urge you to use your imagina- 
tion: there must exists much better ways than 
the ones we have tried until now. 
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